Exploratory preferences explain the human fascination for imaginary worlds in fictional stories

Imaginary worlds are present and often central in many of the most culturally successful modern narrative fictions, be it in novels (e.g., Harry Potter), movies (e.g., Star Wars), video games (e.g., The Legend of Zelda), graphic novels (e.g., One Piece) and TV series (e.g., Game of Thrones). We propose that imaginary worlds are popular because they activate exploratory preferences that evolved to help us navigate the real world and find new fitness-relevant information. Therefore, we hypothesize that the attraction to imaginary worlds is intrinsically linked to the desire to explore novel environments and that both are influenced by the same underlying factors. Notably, the inter-individual and cross-cultural variability of the preference for imaginary worlds should follow the inter-individual and cross-cultural variability of exploratory preferences (with the personality trait Openness-to-experience, age, sex, and ecological conditions). We test these predictions with both experimental and computational methods. For experimental tests, we run a pre-registered online experiment about movie preferences (N = 230). For computational tests, we leverage two large cultural datasets, namely the Internet Movie Database (N = 9424 movies) and the Movie Personality Dataset (N = 3.5 million participants), and use machine-learning algorithms (i.e., random forest and topic modeling). In all, consistent with how the human preference for spatial exploration adaptively varies, we provide empirical evidence that imaginary worlds appeal more to more explorative people, people higher in Openness-to-experience, younger individuals, males, and individuals living in more affluent environments. We discuss the implications of these findings for our understanding of the cultural evolution of narrative fiction and, more broadly, the evolution of human exploratory preferences.

www.nature.com/scientificreports/ genetically inherited personality trait often called Openness-to-experience [56][57][58] . It constitutes one of the five dimensions within the Big Five, the model of human personality 57,59 . The five dimensions that compose it (i.e., Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism) have been designed to capture the universal variability of human personalities and behaviors: humans differ in the personality "scores" associated with each of these dimensions. The Big Five is considered the most widely accepted model of human personality today [60][61][62][63][64] . The Openness trait is correlated with novelty-seeking behavior 38,56,65 , a preference for creativity [66][67][68][69][70][71] , spatial cognitive capacities [72][73][74][75] , a preference for using maps 76 , a preference to explore a system 77,78 , and innovative deviations from observed demonstrations in learning tasks 79 . In other words, people higher in Openness-toexperience are overall more curious and explorative 80 . In the cultural domain, the Openness trait is correlated with the liking of adventure movies, fantasy movies, and science fiction movies 81 , the enjoyment of abstract art 82 , the preference for jazz, blues, classical, rock, alternative, and folk music [83][84][85] . It also correlates with some cultural practices reported by people, such as going to the theatre, art galleries, or museums 86 , or seeking novel food 87 . We, therefore, predict that people higher in Openness should be, on average, fonder of fictional stories set in imaginary worlds.
The sensitivity of exploratory preferences varies according to the developmental stage of the individual. Evolutionary developmental psychology explains why: younger individuals have more to learn from the world, so it is more adaptive for them to explore their environments and try to reduce knowledge gaps [88][89][90] . A complementary explanation posits that the evolutionary costs associated with exploration (e.g., resource shortage risk) are outweighed by parental caregiving investments 91,92 . This can be seen as an adaptive feedback loop or as an adaptive developmental division of labor [93][94][95] .
There is already much experimental evidence that children are indeed more curious and eager to explore than adults (see 96 , for a review). Children are more explorative than adults in foraging tasks [97][98][99] , in bandit tasks 100 , in explanation-seeking tasks 101,102 , in search tasks 103 , in decision-making tasks 104,105 , in problem-solving tasks 106 , in causal-learning tasks 107,108 , and in change-detection and visual search tasks 109 . Importantly, such behavioral data from experimental research and computational modeling show that children are not merely prone to random sampling behavior: they show clear patterns of directed exploration 110 . Another study found that intellectual curiosity is negatively correlated with age, after controlling for education level, sex, and culture 111 . In a foraging task, adolescents explored more and more optimally than adults did 112 , consistent with other findings suggesting that adolescents too are more motivated to explore novel, although risky, scenarios than adults [112][113][114][115][116][117] .
In accordance with such findings, Openness has been shown to decline with age across countries [118][119][120][121][122] . We, therefore, predict that younger people should be, on average, fonder of fictional stories set in imaginary worlds.
The sensitivity of exploratory preferences varies according to an individual's biological sex 123 . Selection pressures for exploratory preferences and abilities have been stronger for males in a lot of terrestrial vertebrates, and more specifically in a lot of mammalian species, because of different mating patterns for access to mates (caused by differences in reproductive variance between the sexes): in this view, spatial exploration is thought of as a male reproductive strategy [124][125][126] . For instance, in humans, there is evidence that, in the Tsimane (i.e., a forager-horticulturalists people in Bolivia), males travel more than females, and even more so during periods of intensive mate search 127 , but not earlier in ontogeny 75 .
Another evolutionary rationale posits that, in humans specifically, exploratory preferences and abilities contributed differentially to the reproductive success of males and females because of the sexual division in foraging activities: males would have specialized in solving spatial problems associated with hunting (which requires a propensity to explore unfamiliar environments) while females would have specialized in solving spatial problems associated with plant gathering (which requires a propensity to learn and remember object locations 128,129 ). Both rationales can explain why, in humans, males develop higher spatial abilities specifically related to exploration than females (see 130,131 for meta-analyses, see 132 for a review) and navigate in wider ranges than females ( 133,134 , but see 135 ).
Lastly, another complementary hypothesis proposes that males and females evolved different cognitive preferences and skills related to inventorying and classifying features when exploring the physical world, partly because of a male specialization in tool-use: 'systemizing' , the drive to explore and understand systems, would have had a greater impact on the reproductive success of males than that of females 77,136 . There is evidence that males score higher in systemizing 78,137,138 , that fetal testosterone levels positively correlate with a highly restricted range of interests, which is a marker of both high-systemizing and high-functioning autism 139 , and that there are more males with either higher systemizing-quotient or autistic traits who are interested in non-social domains of knowledge, such as engineering, mathematics, or science 78,[140][141][142] .
Overall, three different (and non-mutually exclusive) evolutionary hypotheses propose that some adaptive challenges human males specifically faced during their evolution (i.e., searching for mates, hunting, or using tools) led males to be more systematically curious about their spatial (and non-social) environments while leading females to be more systematically curious about their social environments. This is consistent with the findings that (1) in modern societies (here the United States, with experimental evidence from 320,000 participants) males score higher in the personality trait Openness-to-experience 143 , and (2) in hunter-gatherer societies (here, the Hadza of Tanzania, with evidence from GPS data), males explore more land, follow more sinuous paths, walk further per day 144 , and also perform better in three tests of spatial ability 145 . We, therefore, predict that males should be, on average, fonder of fictional stories set in imaginary worlds.
Finally, exploratory preferences are hypothesized to vary according to the local ecology of an individual. Exploration is most valuable and adaptive in more affluent, safer, and therefore more predictable environments [146][147][148] . Why? In unsafe and poor ecologies, exploration is very risky, notably because if exploration does not pay off, one is left with nothing. Relatedly, the opportunity costs of exploration are higher in scarcity because one is better off exploiting one's environment to provide for more pressing needs. Conversely, in more www.nature.com/scientificreports/ affluent, safer, and predictable ecologies, such risks are lower: notably, when surrounded by more resources, individuals can afford to lose some of them in the short term 149 . Therefore, exploration is best defined as a 'venture behavior' , that is, a preference for a high variance of rewards over short-term gains (as opposed to 'hazardous behavior' 150 ). Since organisms evolved in changing environments 151 , selection pressure would have favored exploratory preferences that are highly flexible to the local ecology, with time horizon as the crucial mediator 111,149,152 . More generally, phenotypic plasticity enables an organism to adapt to new situations and environments by changing its behavior, rather than having to wait for genetic adaptations to occur. This flexibility in behavior can be an advantage in unpredictable or changing environments, allowing organisms to survive and reproduce more effectively. The behavioral effect of the local ecological cues on exploratory preferences, curiosity, and spatial search strategies 153 is observed in a wide range of species, such as in orangutan [154][155][156] , honeybees 157 , parrots 158 , and chickadees 159 . It is parsimoniously hypothesized that it applies to humans 146,160 .
At the individual level, there is empirical evidence that people living in richer families score higher in Openness-to-experience 161,162 and that people with higher income at one stage of their life are less likely to decrease in Openness-to-experience later on 163 . In a foraging task, people with more adverse childhood experiences remain in patches longer and, thus, explore less 164 . At the level of societies, recent empirical studies show that, across the world, people living in more affluent countries exhibit higher levels of openness to change and new experiences [165][166][167] . Finally, a recent study shows that between-countries differences in levels of causal learning and pretend play in children (i.e., the United States vs. Peru) are similar to those within-countries due to different socio-economic statuses (i.e., mixed-SES United States vs. low-SES United-States 168 ). We, therefore, predict that people living in more affluent local ecologies should, on average, be fonder of fictional stories set in imaginary worlds.
We reviewed evolutionary rationales and empirical evidence showing how the human motivation to explore (i.e., environmental curiosity) adaptively varies according to people's personality traits, age, sex, and ecological conditions ( Fig. 1).
Our hypothesis, therefore, leads to fine-grained predictions based on the adaptive variability of environmental curiosity 5,20 . If imaginary worlds do exploit this cognitive mechanism, the sources of its adaptive flexibility that we reviewed should account for the variability in the human fascination for imaginary worlds, across time and populations.

Study 1: Unsupervised clustering of movies.
Before testing predictions about the sources of variability of the preference for imaginary worlds, we straightforwardly investigate whether fictional stories with imaginary worlds are related to exploration. We test that (1) stories with imaginary worlds constitute a well-identified cluster in the global set of movies produced and that (2) this emerging cluster is related to environmental curiosity through exploration-related content. We use independently two machine-learning algorithms. The random-forest algorithm, based on manually annotated movies, and trained on plot keywords, is designed to detect imaginary worlds in a sample of 9424 movies. This algorithm is successful in identifying movies set in imaginary worlds with an out-of-bag error rate of 9.35%. In parallel, we combine natural language processing techniques (i.e., Sbert Transformer) and topic modeling methods to project those 9424 movies into a semantic www.nature.com/scientificreports/ latent space, embedding the summary plots: the closer movie summaries are in their meaning and content, the closer movies will be into this space. Seven clusters naturally emerged, and we extracted the most specific n-grams to describe their content (see Fig. 2). Combining both algorithmic methods, we show that at least one cluster which has emerged embeds more specifically movies with imaginary worlds, as identified by the random-forest algorithm. First, we find a significant relationship between being associated with a specific cluster and being detected as a movie with an imaginary world (X 2 (6, N = 9424) = 576.754, p < 0.001). We, therefore, reject the null hypothesis that asserts that the two variables are independent of each other: movies with imaginary worlds are not randomly distributed across all clusters (Fig. 3). We then perform the same analysis to show that one specific cluster (cluster 1, hereafter 'imaginary world cluster') specifically embeds movies with imaginary worlds (X 2 (1, N = 9424) = 1542.759,  www.nature.com/scientificreports/ p < 0.001). In fact, 71% of the movies with imaginary worlds detected by our algorithm belong to this specific cluster. Even if this cluster includes 'only' 30% of movies with imaginary worlds, this compares to only 2%, on average, in the other clusters. Let's note that our algorithm is conservative: it does miss a lot of imaginary worlds (i.e., false negatives), but it is unlikely to wrongly label a movie that is not set in an imaginary world as having one (i.e., false positives; see "Supplementary Materials"). In all, movies with imaginary worlds are similar enough in their content for an unsupervised algorithm to cluster them together in one cluster, based only on plot summaries. This is also qualitatively observable in the n-grams that are most specific to this cluster, blending words related to multiple genres such as fantasy (e.g., 'dragon'), science fiction (e.g., 'alien'), dystopia (e.g., 'survivor'), and more broadly related the supernatural (e.g., 'vampire'). Finally, we show that this imaginary-world cluster specifically embeds movies with exploration-related content, and significantly more so than any other cluster. Each movie summary is ascribed a binary variable of exploration-relatedness, based on the exact match between at least one word from an algorithmically generated list of exploration-related words and words from the movie summaries (see "Methods"). There is a significant relationship between being associated with a cluster and being associated with exploration-related content (X 2 (6, N = 9424) = 75.035, p < 0.001). We, therefore, reject the null hypothesis that asserts that the two variables are independent of each other: movies with exploration-related content are not randomly distributed among all clusters (Fig. 3). We perform again the same analysis and show that the imaginary-world cluster specifically embeds movies with exploration-related terms in their summary plots (X 2 (1, N = 9424) = 73.946, p < 0.001).
Consistent with our general hypothesis, these results suggest that fictions with imaginary worlds resemble each other, at least in part because they are related to exploration. Note that this study also comes as an external validity test for the random-forest algorithm: the latter is successful in identifying movies with imaginary worlds that, in addition, resemble each other in terms of their content. We will use its tagging of movies with imaginary worlds in the next study.

Study 2: Demographic and psychological characteristics of individuals who 'like' movies with imaginary worlds on Facebook.
We now turn to specific predictions about the variability of the fascination for imaginary worlds in stories, that we derived from the adaptive sources of variability of human environmental curiosity (see "Introduction"). We predicted that people higher in Openness-to-experience, younger people, males, and people living in affluent local environments would be more likely to enjoy fictional stories set in imaginary worlds. We used the Movie Personality Dataset (MPD) which aggregates averaged personality (i.e., Big Five) and demographic traits (i.e., sex, age) from the Facebook myPersonality Database (N = 3.5 million 81 ). We couple this dataset with the outcome of the random-forest algorithm which efficiently identifies movies as being set in an imaginary world or not (see "Study 1"). First, we find that, as predicted, movies with imaginary worlds on Facebook are liked by an audience that is, on average, higher in Openness-to-experience than movies with no imaginary worlds (ß = 0.12, p < 0.01, CI [0.02, 0.22], Cohen's d = 0.24; Fig. 4). In other words, approximately 60% of movies with imaginary worlds have higher aggregated scores of Openness-to-experience than the mean of Openness-to-experience of movies with no imaginary world. Although we had no specific prediction derived from our hypothesis, we report the correlations between the four other traits of the Big Five and the liking of . Second, we found that movies with imaginary worlds on Facebook are liked by an audience that is, on average, more composed of males than movies with no imaginary worlds (ß = 0.44, p < 0.001, CI [0.31, 0.57], Cohen's d = 0.68; Fig. 4). It means that there is a 68.5% chance that a movie with an imaginary world picked at random will have a higher percentage of males liking it on Facebook than a movie with no imaginary world picked at random. We found no significant association between the age of consumers and the presence of imaginary worlds in movies (ß = − 0.005, p = 0.33, CI [− 0.015, 0.0051], Fig. 4). Finally, with the full model, with all 3 variables of interest as explanatory variables, and the liking of movies with imaginary worlds as the outcome variable, we found significant coefficients with the predicted directions (Fig. 5).
With computational methods, we provide observational evidence that people who like movies with imaginary worlds on Facebook are overall higher in Openness-to-experience. These results are in line with existing empirical evidence showing an association between Openness-to-experience and the consumption of or preference for specific genres often associated with imaginary worlds such as science fiction and fantasy 81,[169][170][171][172][173] . Also consistent with our predictions, we provide evidence that people who report liking movies with imaginary worlds are more likely to be males. We did not find any significant association between age and the liking of movies with imaginary worlds. This can be explained by the very restricted range of Facebook users at the time the participants were interviewed (in 2009-2010): aggregated ages associated with each movie range from 17.9 to 32.7 years old. We would need a much larger range to assess the impact of age on the consumption of fictional stories with imaginary worlds. Let's note that evolutionary developmental psychology does not make any strong predictions about the change in the sensibility of environmental curiosity within this specific life stage. Rather, it makes predictions about the difference in the sensitivity of environmental curiosity between this life stage, earlier ones, and older ones 174 . Further research should investigate the differences in cultural preferences between children and other life stages. Finally, let us note that, with this study, the results about the personality traits and biological sex of such audiences are generalizable only within the specific age range of this dataset. Study 3: Demographic and psychological characteristics of individuals who self-report liking stories set in imaginary worlds. We now turn to experimental tests of the same predictions (all preregistered). We asked participants to report their preferences for fictional stories with imaginary worlds using a questionnaire and asked them to respond to a range of psychometric questionnaires (see "Methods"). We   www.nature.com/scientificreports/ predicted that people higher in Openness-to-experience, younger people, males, and people living in affluent local environments would be more likely to enjoy stories set in imaginary worlds. We take advantage of experimental paradigms to further test two other hypotheses. We test the presumably complementary 'systemizing hypothesis' , which suggests that people enjoy imaginary worlds because they like to understand the ways newly presented imaginary worlds are structured and operate (i.e., because they are 'higher systemizers'). We also test the alternative and widely spread 'escapist hypothesis' which posits that people enjoy imaginary worlds because they want to escape the difficulties of the real world: we look at whether people who report having more difficulties in life also report enjoying more fictional stories with imaginary worlds. We predicted that it would not be the case. We report the results in Fig. 6. First, we tested our core prediction about the association between the preference for imaginary worlds and environmental curiosity: people who score higher in the Curiosity and Exploration Inventory scale report enjoying more imaginary worlds in fictional stories (Fig. 7). Then, we tested predictions related to the adaptive variability of environmental curiosity and found that participants who report liking more imaginary worlds in stories are overall higher in Openness-to-experience, younger, and more likely to be males (Fig. 8). Although participants with higher socio-economic status scored significantly higher on the Curiosity and Exploration Inventory scale, they did not report enjoying more imaginary worlds. It is consistent with the hypothesis that phenotypic plasticity does impact exploratory preferences (which increase as the local ecology gets more affluent and predictable), but it suggests that it may not translate in the cultural domain. It might also be that this prediction has been affected by the reduction of the sample size due to participation exclusion.
Let's finally turn to the two other hypotheses that we tested. First, it does not seem that stories with imaginary worlds are enjoyed because they allow consumers to 'escape' the difficulties of the real world. We tested this prediction by looking at the correlation between a self-reported measure of well-being and the preference for imaginary worlds. We reasoned that, if this hypothesis were true, the more unhappy people are, the more they should like imaginary worlds. As predicted, this association turned out to be non-significant. Of course, more empirical tests should be run, but this first result suggests that the 'escapist hypothesis' is either false or Figure 6. Summary of the predictions and results of the experimental study with the self-reporting paradigm, as pre-registered. We removed from the pre-registration 2 mediation tests that we could not perform (see Preregistration). incomplete. Finally, we tested the effect of levels of systemizing on the preference for imaginary worlds. This supplementary hypothesis was confirmed: the higher people are in systemizing, the higher they score on the Curiosity and Exploration Inventory scale and the more they report enjoying imaginary worlds. With a mediation analysis, we found that the systemizing quotient mediated 70% of the effect of sex on the preference for imaginary worlds: while this is not a causal paradigm, this result is consistent with the hypothesis that males enjoy more imaginary worlds in large part because they're higher in systemizing (Fig. 9). This study replicates previous findings of sex differences in systemizing, with similar magnitude 77,78 . We further demonstrate that this difference translates in the cultural domain, impacting the reported preference for imaginary worlds in fiction. Besides, we show that the psychological trait of systemizing is highly correlated to exploratory preferences, supporting the hypothesis that systemizing is subsumed under the drive to explore that we coined environmental curiosity 175 . Under our theoretical account, systemizing is the labeling of an extreme form of information processing that this mechanism of curiosity about the physical world can take, which is indeed modulated by sex differences because of ancestral selection pressures. Crucially, this account may explain why fans of imaginary worlds like to explore imaginary worlds in depth rather than exploring new worlds again and again 176 , and why fans of imaginary worlds end up remembering and storing huge amounts of information related to the imaginary worlds they like, for instance in Wikipedia-like online 'Fandoms' (e.g., the Star Wars online fandom aggregates more than 175,000 pages 1 ).

Discussion
In all, we provide empirical evidence supporting the hypothesis that exploratory preferences explain why humans are fascinated by imaginary worlds in fictional stories. We reviewed the evolutionary psychological literature on environmental curiosity and exploratory preferences in humans. Then, we showed that fictions with imaginary worlds cluster together because of the semantic proximity of their summary plots, suggesting that they resemble each other in terms of their content. We showed that movies from this cluster are specifically associated with exploration-related content. As predicted by the exploration hypothesis 5 , we then provided evidence that the  www.nature.com/scientificreports/ adaptive variability of the sensitivity of environmental curiosity reflects, and therefore likely explains, the variability of the preference for imaginary worlds in fiction. Observational analyses of large cultural datasets showed that people who 'like' movies with imaginary worlds on Facebook are overall higher in Openness-to-experience, younger, and more likely to be males (when controlling for the two other variables). This dataset can be biased as it aggregates personality scores of people who decided to create a Facebook account and 'like' movies on Facebook. Therefore, we replicated such findings with experimental methods: we provided consistent evidence that participants who report enjoying imaginary worlds in movies, novels, and video games are overall higher on a scale of exploratory preferences, higher in Openness-to-experience, younger, higher in systemizing, and more likely to be males. We did not find an association between the socioeconomic status of the participants and their reported preference for imaginary worlds, although participants higher in socio-economic status scored significantly higher on the scale of exploratory preferences. The core prediction that, in synchrony, the socioeconomic level of people should impact the preference for fictional stories set in imaginary worlds, through a mediating effect of environmental curiosity, should be further tested with other datasets or experimental tests. Future research should also further study the causal impact of ecological conditions on the production of speculative fictional stories in diachrony, at the macro-level of societies. www.nature.com/scientificreports/ Cognitive scientists have long argued that universal cognitive adaptations can explain the evolution, stabilization, and distribution of cultural traits 21,[177][178][179] . Here we demonstrate that the way such cognitive adaptations vary (between individuals, across ontogeny, and with changes in the local ecologies) can explain the variable appeal of such cultural traits to human cognition. More specifically in studying fictional stories, some researchers have focused on universal appeal for some content features, making some stories more popular than others ( 180-183 , e.g., for romance [184][185][186][187][188] , e.g., for horror 189 ). We contend that evolutionary psychology now provides predictions and powerful ways to interpret findings about the differences and changes in human cultural preferences (e.g., for romance 158,159 , e.g., for horror 160 ). Further research in fiction study could investigate the variability of many other preferences (and associated consummatory behavior) with such an evolutionary framework.
Behind the field of entertainment, the success of imaginary worlds in modern societies reveals important changes in individual preferences and personality traits. Why would people come to enjoy stories with imaginary worlds now, and not before? Because we have provided empirical evidence that the appeal for imaginary worlds relies on exploratory preferences, the increasing success of fiction with imaginary worlds may reflect changes in human exploratory preferences. We proposed that humans universally become more curious and explorative as they live in more affluent ecologies, notably because the evolutionary costs of curiosity decrease in such environments. This hypothesis did not lead to significant results when comparing people's preferences for imaginary worlds at different socio-economic levels. However, it could mean that people process other cues than sheer income to assess how well-off they are (e.g., cues at the country level, such as unemployment insurance). If our hypothesis is true, economic growths of the last decades or even of the last centuries, in most human societies, likely fueled a bigger and bigger audience for stories set in imaginary worlds, and producers of fiction could therefore invest more and more in the creation and refinement of such worlds 190 .
It is worth noting that this hypothesis fits qualitative observations about the cultural evolution of imaginary worlds at the country level. Modern stories with imaginary worlds first became popular in the United Kingdom 4 , which was at the time the leading country in terms of GDP per capita 191 , and then mostly developed in the Euro-American sphere. By contrast, for most of the nineteenth and twentieth centuries, the popularity of imaginary worlds was rather limited in less economically developed countries. For instance, while Jules Verne was first translated into Chinese in the early 20th and inspired Chinese writers to write science-fiction and fantasy stories during the late Qing dynasty and early Republican era, stories set in imaginary worlds remained marginal in Chinese literature during the twentieth century 192,193 . In East Asia, imaginary worlds started to become mainstream first in Japan in the 1950s 194,195 which had started its industrialization in the late nineteenth century, then in Hong Kong and Taiwan 196 , which had started to develop economically in the 1970' . During the same time, imaginary worlds were much less popular in mainland China 192,197 and they became mainstream in mainland China at the turn of the new millennium, that is, 20 years after the take-off of the Chinese economy [196][197][198][199] .
While future empirical research should thoroughly test this hypothesis, there are already several indications in favor of this idea. For instance, recent studies on the evolution of personality traits have shown an increase in Openness-to-experience in high-income countries both in Western 200 and Eastern societies 201 . However, these studies are obviously limited, both in terms of sample size, population diversity, and measurement. If we are right, the rise of imaginary worlds in all parts of the world would suggest that Openness-to-experience is rising in modern societies and that it has been rising for at least 150 years. That is, we now could use the evolution of the relative production of stories with imaginary worlds as a proxy for changes in human exploratory preferences. Our results can therefore contribute to the understanding of behavioral and cultural changes over the long run 146,166,202 .

Data. Extraction of existing data from the Internet or previous studies (Studies 1 & 2). We use the Internet
Movie Database (IMDb) to obtain metadata about 9424 movies, such as their genres, summary plots, and their keywords. In the Movie Personality Dataset (MPD), for each of the 846 movies, we have (1) the movie metadata, (2) the average personality traits, average age, and average sex of people who like it on Facebook, and (3) the presence (or not) of an imaginary world in it (see "Algorithmic methods", below). Nave et al. 81 built an important dataset that makes it possible to map the associations between movie characteristics and the characteristics common to people who like such movies. Note that because socio-demographic scores are aggregated, the sex variable associated with each movie becomes a continuous variable between 0 and 1 (as a percentage of males who liked it, where 1 would mean that all people who liked this movie on Facebook self-reported themselves as men).

Collection of original data through experimental designs (Study 3).
The design and predictions for this study were pre-registered (https:// osf. io/ 8yj3v). All methods were carried out in accordance with relevant regulations and approved by the Conseil d' évaluation éthique pour les recherches en santé, CERES n°201,659. We recruited 350 participants from the online research participation platform Prolific (180 males, 165 females, 4 others, M age = 46, SD age = 19.5). Participants confirmed their informed consent. We removed participants failing the attention check and participants failing to respond to the follow-up study, leaving a total sample of 230 participants (101 males, 127 females, 2 others, M age = 48, SD age = 16.3, Range age : 19-82; we still run the analyses that were possible without the follow-up study with the entire sample size, after removal of participants who failed the attention check; see "Supplementary Materials"). Our pre-registered sample size was higher (319) for 95% power (with α = 0.2 and p < 0.05), but with 230 participants the statistical power level is above 80%. All methods were carried out by relevant guidelines and regulations and informed consent was obtained from all subjects participants. www.nature.com/scientificreports/ In the first part of the experimental study, 3 paradigms aimed at capturing an individual score of preference for fictional stories with imaginary worlds. Since the latter paradigm is the only one that provided consistent results with the large-scale observational study, we only provided detailed results for this one (see "Supplementary Information" for the results of the other paradigms). While we do not deny that such failures to find results consistent with our predictions across all pre-registered paradigms weakens the significance of our findings, it is not surprising to us that the self-reporting method (efficiently used in similar research; e.g., 203,204 ) has overall more predictive power than newly designed paradigms. Besides, when asked to choose between two movies or rate a movie summary, people likely use lots of cues to make their choice, such that the fact that it takes place in an imaginary world might not be decisive. Conversely, the last paradigm, where participants are straightforwardly asked whether they like movies, novels, or video games set in imaginary worlds, targets more precisely the content feature we want to study.
We believe that such paradigms have limitations that can explain such results. Our theory predicts that some content features go along better together, because they tap into psychological preferences that share common cognitive and neural bases, and therefore are present in the same people. This would mean that some of the randomly created movie plots would be 'objectively' better at appealing to a certain audience because by chance they would bring together locations and plots that are psychologically 'consistent' . Some others are not. This creates a bias in the randomly created plots. Regarding the second paradigm, it is obvious that people take many elements into account when deciding which of two films they would prefer to watch. The presence of imaginary worlds might not make any difference for some people and might even be hard to detect with the cues we present them with. We now think that all content features of movies should be controlled for in such paradigms if we want to be precise about what drives people to consume and enjoy some stories.
Let's note, however, that the three computed scores of preferences for imaginary worlds all significantly and positively correlate with each other. It suggests that the (self-reported) specific preference for imaginary worlds drives the actual choice of consumption of movies, while not driving it enough to provide significant results with our sample size (see "Supplementary Materials"). Further research should keep on trying to find more ecologically valid experimental paradigms to complement findings from self-reporting. We believe that the associations between horror movies and morbid curiosity 204,205 and between movies with imaginary worlds and environmental curiosity could serve as tests that new experimental paradigms are successful in capturing the preferences of consumers, before expanding the methodology.
For the self-reporting paradigm, we first created an 8-item scale. A factor analysis (KMO sampling adequacy = 0.62; see "Supplementary Materials") indicated that two clusters of items emerged from the responses (X 2 (13) = 60.19, p < 0.001). We removed the items that didn't load onto factor 2, which was more specific to the preference for imaginary worlds (see "Supplementary Materials"). The 4-item scale showed near-acceptable reliability (α = 0.66). Here are the 4 items, that participants had to rate on a 7-Likert scale from 'I fully disagree' to 'I fully agree': (1) 'I like movies, novels and video games with more information about the world than about the characters' , (2) 'I like movies, novels and video games in which the fictional characters explore their environment' , (3) 'I like movies, novels and video games with novel and surprising technologies' , and (4) 'I like movies, novels and video games which make me feel I am traveling in a foreign world' . The final individual score of preference for imaginary worlds is the mean of all the ratings.
Then, the participants were also asked to respond to (1) the Big Five questionnaire BFI-10 206 , to measure the score of Openness-to-experience, (2) the Curiosity and Exploration Inventory-II 207 to measure exploratory preferences, (3) the short Warwick-Edinburgh Mental Well-being Scale 208 to assess well-being scores, (4) the 8-item version of the Systemizing Quotient 209 to measure scores of systemizing, (5) the childhood and current Socio-Economic Status (as designed in 210 ) as a proxy for the affluence of the local ecology, (6) their reported gender, and (7) their age.
Algorithmic methods. Random-forest algorithm (study 1 & 2). The first step is to detect the presence of imaginary worlds in movies. We start by manually coding 385 movies randomly selected in the IMDb dataset, as being set in an imaginary world or not. We base this decision on one main criterion: whether or not the IMDb movie summary mentions a location that does not exist in the real world. Then, we extend this categorization to 9424 movies with a classification algorithm based on a random-forest method 211 and trained on plot keywords (i.e., user-generated keywords associated with movies which describe "any notable object, concept, style or action that takes place during a title"). This algorithm is successful in identifying movies set in imaginary worlds with an out-of-bag error rate of 9.35%. Among the 328 movies annotated as not being set in an imaginary world, the random-forest algorithm miscategorized only 5 of them, and among the 57 movies that we manually annotated as being set in an imaginary world, it accurately finds 26 of them: the algorithm, therefore, underestimates the number of movies with imaginary worlds. To further validate the external validity of this predictive algorithm, we showed that movies identified as being set in an imaginary world by the algorithm were more likely to be classified in the science fiction and fantasy IMDb genres, two genres in which producers of fiction commonly classify fictions with imaginary worlds (see "Supplementary Materials" for the results).
Topic modeling method (study 1). Independently from this first step, we use Natural Language Processing methods and Topic Modeling to project those 9424 movies into a semantic latent space. More specifically, we use SBert Transformer [211][212][213][214] , which has been trained on millions of common language corpora and can map words, sentences, and paragraphs to a multidimensional dense vector space (i.e., word embedding; 215,216 , and which achieves state-of-the-art performance on machine learning-tasks related to text understanding 217 ). Such techniques allow us to define the semantic closeness of words, sentences, or paragraphs in an unsupervised fashion, by making the algorithm look at the contexts in which words are used in common language corpora. www.nature.com/scientificreports/ The underlying assumption is that words used in similar ways, at such a very large scale, have similar meanings.
Here, we project movies into a semantic space using their movie description: the closer movie summaries are semantically, the closer movies will be into this space. Then, we use the K-Means algorithm to cluster this space into 7 clusters (with the elbow method 218 determining the number of clusters that maximizes the explained variation). For every cluster, we compute the most 20 specific n-grams using the chi-squared statistics test, thus providing the words that most specifically describe the clusters. For these computations, we use the Python 'bunkatech' package (https:// github. com/ charl esded ampie rre/ Bunka Topics).
Creation of an extended list of exploration-related terms. First, we manually create a list of 5 core terms directly related to exploration (i.e., 'exploration' , 'explorer' , 'explorers' , 'explores' , 'exploring'). Then, we extend this list using again the algorithm Sbert Transformer, this time applied to the movie summaries. More specifically, we find the 20 words that are closest to each of the core terms in the dataset of the movie summaries itself (with no consideration of whether the movie is set in an imaginary world or not, or is part of the imaginary-world cluster or not), and then remove duplicates. We end up with 37 terms in this extended list of exploration-related words.
Statistical models. Chi-2 tests of independence (study 1). We combine both algorithmic methods (see "Algorithmic methods"): we use the chi-squared test of independence to check that at least one cluster which has emerged from the Topic Modeling method embeds more specifically movies with imaginary worlds, as identified by the random-forest algorithm. In other words, we look at the correspondence between two computationally designed features of 9424 movies: belonging to an emergent cluster and being detected as a movie with an imaginary world. We use the same test to check that the imaginary-world cluster embeds more specifically such movies with exploration-related terms. In other words, we look at the correspondence between two computationally designed features of movies: belonging to the imaginary-world cluster and the binary variable of exploration-relatedness.
Linear probability models (study 2). To test the correlations between the appeal for movies with imaginary worlds and the average scores of Openness-to-experience, age, and sex, we use Linear Probability Models, with such scores as explanatory variables, and the binary variable of the presence or absence of an imaginary world as the outcome variable. Then, we use one Linear Probability Model with all the scores as explanatory variables and the binary variable of the presence or absence of an imaginary world as the outcome variable (see "Supplementary Materials", Appendix B, for model assumptions check).

Linear models and t-tests (study 3).
To test predictions with the data from the experiment, we use (1) linear models with the score of preference for imaginary worlds as the dependent variables, and, in turn, the score of the Curiosity and Exploration Inventory-II, the score of Openness-to-experience, the age, the socio-economic status, the Systemizing-Quotient, and the Well-Being score, (2) linear models with the score of Curiosity and Exploration Inventory-II as the dependent variables, and in turn, the score of Openness-to-experience, the age, the socio-economic status, and the Systemizing Quotient, (3) t-tests with, in turn, the score of preference for imaginary worlds and the Systemizing Quotient as the dependent variables, and the sex as a binary variable as the dependent variable. We also perform a mediation analysis with the R 'Mediation' package. Finally, we perform a non-preregistered linear model with the score of preference for imaginary worlds as the dependent variable and scores of Openness-to-experience, sex, age, and socio-economic status as the independent variables.

Data availability
Pre-registration, data, and R scripts are available on OSF: https:// osf. io/ zu2gq/? view_ only=. www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.